315 research outputs found

    Self-Attention Networks for Connectionist Temporal Classification in Speech Recognition

    Full text link
    The success of self-attention in NLP has led to recent applications in end-to-end encoder-decoder architectures for speech recognition. Separately, connectionist temporal classification (CTC) has matured as an alignment-free, non-autoregressive approach to sequence transduction, either by itself or in various multitask and decoding frameworks. We propose SAN-CTC, a deep, fully self-attentional network for CTC, and show it is tractable and competitive for end-to-end speech recognition. SAN-CTC trains quickly and outperforms existing CTC models and most encoder-decoder models, with character error rates (CERs) of 4.7% in 1 day on WSJ eval92 and 2.8% in 1 week on LibriSpeech test-clean, with a fixed architecture and one GPU. Similar improvements hold for WERs after LM decoding. We motivate the architecture for speech, evaluate position and downsampling approaches, and explore how label alphabets (character, phoneme, subword) affect attention heads and performance.Comment: Accepted to ICASSP 201

    Transformation Based Interpolation with Generalized Representative Values

    Get PDF
    Fuzzy interpolation offers the potential to model problems with sparse rule bases, as opposed to dense rule bases deployed in traditional fuzzy systems. It thus supports the simplification of complex fuzzy models and facilitates inferences when only limited knowledge is available. This paper first introduces the general concept of representative values (RVs), and then uses it to present an interpolative reasoning method which can be used to interpolate fuzzy rules involving arbitrary polygonal fuzzy sets, by means of scale and move transformations. Various interpolation results over different RV implementations are illustrated to show the flexibility and diversity of this method. A realistic application shows that the interpolation-based inference can outperform the conventional inferences

    Fuzzy interpolative reasoning via scale and move transformation

    Get PDF
    Interpolative reasoning does not only help reduce the complexity of fuzzy models but also makes inference in sparse rule-based systems possible. This paper presents an interpolative reasoning method by means of scale and move transformations. It can be used to interpolate fuzzy rules involving complex polygon, Gaussian or other bell-shaped fuzzy membership functions. The method works by first constructing a new inference rule via manipulating two given adjacent rules, and then by using scale and move transformations to convert the intermediate inference results into the final derived conclusions. This method has three advantages thanks to the proposed transformations: 1) it can handle interpolation of multiple antecedent variables with simple computation; 2) it guarantees the uniqueness as well as normality and convexity of the resulting interpolated fuzzy sets; and 3) it suggests a variety of definitions for representative values, providing a degree of freedom to meet different requirements. Comparative experimental studies are provided to demonstrate the potential of this method

    Fuzzy interpolation with generalized representative values

    Get PDF
    Fuzzy interpolative reasoning offers the potential to model problems using sparse rule bases, as opposed to dense rule bases deployed in traditional fuzzy systems. It thus supports the simplification of complex fuzzy models in terms of rule number and facilitates inferences when limited knowledge is available. This paper presents an interpolative reasoning method by means of scale and move transformations

    Scale and move transformation-based fuzzy interpolative reasoning:A revisit

    Get PDF
    This paper generalises the previously proposed interpolative reasoning method 151 to cover interpolations involving complex polygon, Gaussian or other bell-shaped fuzzy membership functions. This can be achieved by the generality of the proposed scale and move transformations. The method works by first constructing a new inference rule via manipulating two given adjacent rules, and then by using scale and move transformations to convert the intermediate inference results into the final derived conclusions. This generalised method has two advantages thanks to the elegantly proposed transformations: I) It can easily handle interpolation of multiple antecedent variables with simple computation; and 2) It guarantees the uniqueness as well as normality and convexity of the resulting interpolated fuzzy sets. Numerical examples are provided to demonstrate the use of this method

    Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)

    Get PDF
    In this paper, we present a multimodal Recurrent Neural Network (m-RNN) model for generating novel image captions. It directly models the probability distribution of generating a word given previous words and an image. Image captions are generated by sampling from this distribution. The model consists of two sub-networks: a deep recurrent neural network for sentences and a deep convolutional network for images. These two sub-networks interact with each other in a multimodal layer to form the whole m-RNN model. The effectiveness of our model is validated on four benchmark datasets (IAPR TC-12, Flickr 8K, Flickr 30K and MS COCO). Our model outperforms the state-of-the-art methods. In addition, we apply the m-RNN model to retrieval tasks for retrieving images or sentences, and achieves significant performance improvement over the state-of-the-art methods which directly optimize the ranking objective function for retrieval. The project page of this work is: www.stat.ucla.edu/~junhua.mao/m-RNN.html .Comment: Add a simple strategy to boost the performance of image captioning task significantly. More details are shown in Section 8 of the paper. The code and related data are available at https://github.com/mjhucla/mRNN-CR ;. arXiv admin note: substantial text overlap with arXiv:1410.109
    corecore